Prosper Marketplace is America’s first peer-to-peer lending marketplace, with over $7 billion in funded loans. Here, borrowers can request personal loans and investors can fund anywhere from $2,000 to $35,000 per loan request. I am interested in the features regarding to borrowers, lenders and Prosper.org. hence, the next we’re going to explore and the prosper loan dataset.
## [1] 113937 81
There are 113937 observations and 81 variables in the data set.
## [1] "ListingKey"
## [2] "ListingNumber"
## [3] "ListingCreationDate"
## [4] "CreditGrade"
## [5] "Term"
## [6] "LoanStatus"
## [7] "ClosedDate"
## [8] "BorrowerAPR"
## [9] "BorrowerRate"
## [10] "LenderYield"
## [11] "EstimatedEffectiveYield"
## [12] "EstimatedLoss"
## [13] "EstimatedReturn"
## [14] "ProsperRating..numeric."
## [15] "ProsperRating..Alpha."
## [16] "ProsperScore"
## [17] "ListingCategory..numeric."
## [18] "BorrowerState"
## [19] "Occupation"
## [20] "EmploymentStatus"
## [21] "EmploymentStatusDuration"
## [22] "IsBorrowerHomeowner"
## [23] "CurrentlyInGroup"
## [24] "GroupKey"
## [25] "DateCreditPulled"
## [26] "CreditScoreRangeLower"
## [27] "CreditScoreRangeUpper"
## [28] "FirstRecordedCreditLine"
## [29] "CurrentCreditLines"
## [30] "OpenCreditLines"
## [31] "TotalCreditLinespast7years"
## [32] "OpenRevolvingAccounts"
## [33] "OpenRevolvingMonthlyPayment"
## [34] "InquiriesLast6Months"
## [35] "TotalInquiries"
## [36] "CurrentDelinquencies"
## [37] "AmountDelinquent"
## [38] "DelinquenciesLast7Years"
## [39] "PublicRecordsLast10Years"
## [40] "PublicRecordsLast12Months"
## [41] "RevolvingCreditBalance"
## [42] "BankcardUtilization"
## [43] "AvailableBankcardCredit"
## [44] "TotalTrades"
## [45] "TradesNeverDelinquent..percentage."
## [46] "TradesOpenedLast6Months"
## [47] "DebtToIncomeRatio"
## [48] "IncomeRange"
## [49] "IncomeVerifiable"
## [50] "StatedMonthlyIncome"
## [51] "LoanKey"
## [52] "TotalProsperLoans"
## [53] "TotalProsperPaymentsBilled"
## [54] "OnTimeProsperPayments"
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"
## [57] "ProsperPrincipalBorrowed"
## [58] "ProsperPrincipalOutstanding"
## [59] "ScorexChangeAtTimeOfListing"
## [60] "LoanCurrentDaysDelinquent"
## [61] "LoanFirstDefaultedCycleNumber"
## [62] "LoanMonthsSinceOrigination"
## [63] "LoanNumber"
## [64] "LoanOriginalAmount"
## [65] "LoanOriginationDate"
## [66] "LoanOriginationQuarter"
## [67] "MemberKey"
## [68] "MonthlyLoanPayment"
## [69] "LP_CustomerPayments"
## [70] "LP_CustomerPrincipalPayments"
## [71] "LP_InterestandFees"
## [72] "LP_ServiceFees"
## [73] "LP_CollectionFees"
## [74] "LP_GrossPrincipalLoss"
## [75] "LP_NetPrincipalLoss"
## [76] "LP_NonPrincipalRecoverypayments"
## [77] "PercentFunded"
## [78] "Recommendations"
## [79] "InvestmentFromFriendsCount"
## [80] "InvestmentFromFriendsAmount"
## [81] "Investors"
All features of this dataset prosperLoanData
Due to many variables in this dataset, I only use 15 variables. Those are:
## [1] 113937 15
## 'data.frame': 113937 obs. of 15 variables:
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
## $ ListingCategory..numeric.: int 0 2 0 16 2 1 1 2 7 7 ...
## $ Occupation : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
## $ EmploymentStatus : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ IncomeRange : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ LoanOriginationDate : Date, format: "2007-09-12" "2014-03-03" ...
## $ LoanOriginationQuarter : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
In this section, I want to explore the dataset regarding to borrowers created in prosper, the Loan Term they are more likely to select, the prosper rating, the borrowers’ occupation and their annual incomes, listing category,borrower’s interest rate, lender yield, loan amount, borrowers late payment and the investors distributions.
## 12 36 60
## 1614 87778 24545
There are almost 80% of the borrowers more likely to choose 3-year fixed loan term while they creating the listing.
## [1] "Q1 2011" "Q4 2012"
There are 1614 loan listing with fixed Term 1 year, those listing are created from 2010 to 2013, I am not sure if this part of data is correct or not? as from prosper.com, that only support Fixed term—3 or 5 years.
## Other Professional
## 28617 13628
## Computer Programmer Executive
## 4478 4311
## Teacher Administrative Assistant
## 3759 3688
## Analyst
## 3602 3588
## Sales - Commission Accountant/CPA
## 3446 3233
## Clerical Sales - Retail
## 3164 2797
## Skilled Labor Retail Management
## 2746 2602
## Nurse (RN) Construction
## 2489 1790
## Truck Driver Laborer
## 1675 1595
## Police Officer/Correction Officer Civil Service
## 1578 1457
## Engineer - Mechanical Military Enlisted
## 1406 1272
## Food Service Management Engineer - Electrical
## 1239 1125
## Food Service Medical Technician
## 1123 1117
## Attorney Tradesman - Mechanic
## 1046 951
## Social Worker Postal Service
## 741 627
## Professor Realtor
## 557 543
## Doctor Nurse (LPN)
## 494 492
## Nurse's Aide Tradesman - Electrician
## 491 477
## Waiter/Waitress Fireman
## 436 422
## Scientist Military Officer
## 372 346
## Bus Driver Principal
## 316 312
## Teacher's Aide Pharmacist
## 276 257
## Student - College Graduate Student Landscaping
## 245 236
## Engineer - Chemical Investor
## 225 214
## Architect Pilot - Private/Commercial
## 213 199
## Clergy Student - College Senior
## 196 188
## Car Dealer Chemist
## 180 145
## Psychologist Biologist
## 145 125
## Religious Flight Attendant
## 124 123
## Homemaker Tradesman - Carpenter
## 120 120
## Student - College Junior Tradesman - Plumber
## 112 102
## Student - College Sophomore Dentist
## 69 68
## Student - College Freshman Student - Community College
## 41 28
## Judge Student - Technical School
## 22 16
The largest number is Other, it is understandable, because all cases not in above can be Other. If we don’t consider the “Other” case, Professional has the most top count and almost three times as many as the second. The count of borrowers on the top three is professional, Computer Programer, Executive, at the bottom of the list are the students, generally, the students don’t need to borrow money, or they don’t have strong credit rating to create loan request from prosper.
## A AA B C D E HR
## 29084 14551 5372 15581 18345 14274 9795 6935
Prosper Ratings, from lowest-risk to highest-risk, are labeled AA, A, B, C, D, E, and HR (“High Risk”), we can see 29084 prosper rating is labeled “NA”,this is understandable since the prosper rating is provided since 2009 after SEC registration, there is no rating labeled to the borrowers before 2009. If we do not consider the “NA”, the prosper rating appears Binomial distribution, 57% of borrowers are labeled middle risk B,C and D. Borrowers’s rating C are the highest frequency up to 22%.
## $0 $1-24,999 $100,000+ $25,000-49,999 $50,000-74,999
## 621 7274 17337 32192 31050
## $75,000-99,999 Not displayed Not employed
## 16916 7741 806
The borrower’s income ranges appears Binomial distribution. most of borrowers’ annual income are fall into $25,000 - 75,000. and there are 621 borrowers with zero incomes they got loans, it’s default value if prosper not get the source of borrower’s incomes, or still the borrowers can get loans if they have good credit rating from other path.
## Employed Full-time Not available Not employed
## 2255 67322 26355 5347 835
## Other Part-time Retired Self-employed
## 3806 1088 795 6134
## [1] 11 15
Further exploring the borrowers whose incomes are zero, find 11 borrowers are Not employed, they may not have incomes? all the loan requests were created on Q3 2007. I guess it should be data issue on the person with zero annual income.
## 1 0 7 2 3 6 4 13 15 18 14 20
## 58308 16965 10494 7433 7189 2572 2395 1996 1522 885 876 771
## 19 5 16 11 8 10 9 12 17
## 768 756 304 217 199 91 85 59 52
Category: The category of the listing that the borrower selected when posting their listing: 0 - Not Available, 1 - Debt Consolidation, 2 - Home Improvement, 3 - Business, 4 - Personal Loan, 5 - Student Use, 6 - Auto, 7- Other, 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans, 13 - Household Expenses, 14 - Large Purchases, 15 - Medical/Dental, 16 - Motorcycle, 17 - RV, 18 - Taxes, 19 - Vacation, 20 - Wedding Loans.
Out of the listing category, three categories comes out as the highest past 10000 loans. There’s N/A and Other categories, so we can’t know for sure the specific category. But one comes out highest which is category 1 - Debt Consolidation, where’s one take out a loan to pay many others. This comes really high with 58308 loans,it could be that many Prosper visitors comes with already have loans, and want to search some loans to pay for it?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
The borrower’s rate follow an almost unimodal distribution, with the peak around 0.16. There’s small spike occurs around 0.3.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0100 0.1242 0.1730 0.1827 0.2400 0.4925
It’s very similar to the borrower’s rate, the lender’s yield also follow an almost unimodal distribution, with the peak around 0.14. There’s small spike occurs around 0.3.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
This is interesting, here we see that the distribution is right skewed, with the peak USD 3500, then some small spikes occur around 10000, 20000, 30000 dollars.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.0000 0.0000 0.5921 0.0000 83.0000 697
Here we can see that there is 75% of borrowers no delinquencie record on prosper peer-to-peer lending marketplace, to my surprise, we found the maximum elinquencie up to 83. It’s really high.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 44.00 80.48 115.00 1189.00
We see that 25% of listing are funded by 1 or 2 investors, most of listing are less than 40 investors.
There are 113937 observations and 81 variables in the raw data set. Due to many variables in this dataset, I only select 15 features from dataset prosperLoanData.
There are almost 80% of the borrowers are more likely to select 3-year fixed loan while they creating the list. Why it’s 3-year? is it related to BorrowerRate? Is longer term caused higher BorrowerRate? I will explore in the following section later.
If we don’t consider the “Other” case, Professional has the most top count and almost three times as many as the second (I am thinking Professional is not exactly accurate, because many occupations can be Professinal). The count of borrowers on the top three are Professinal, Computer Programer and Executive, at the bottom of the list are the students.
The prosper rating appears Binomial distribution, 57% of borrowers are labeled middle risk B,C and D. Borrowers with rating C is the highest frequency up to 22%.
The borrowers’ income ranges appears Binomial distribution. 78% of borrowers’ annual income are fall into $25,000 - 75,000. and there are 621 borrowers with zero incomes they got loans, further exploring the borrowers whose incomes are zero, only find 11 borrowers are Not employed while they creating list on prosper, I guess it should be data issue on the person with zero annual income.
Out of the listing category, three categories comes out as the highest past 10000 loans. There’s N/A and Other categories, so we can’t know for sure the specific category. But one comes out highest which is category 1 - Debt Consolidation, where’s one take out a loan to pay many others. This comes really high with 58308 loans, it could be that many Prosper visitors comes with already have loans, and want to search some loans to pay for it?
##
## Pearson's product-moment correlation
##
## data: as.numeric(ProsperRating..Alpha.) and BorrowerRate
## t = 917.37, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9524846 0.9537172
## sample estimates:
## cor
## 0.9531049
AA LOWEST risk-> HR HIGHEST risk
We see that the higher risk labeled to borrowers,the higher borrower rate they would cost. And we see that it has a strong positive linear relationship. This is can be explained as the interest rate is the one where the loaner has benefit from the money that he/she loans.
##
## Pearson's product-moment correlation
##
## data: as.numeric(ProsperRating..Alpha.) and LoanOriginalAmount
## t = -138.17, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4340341 -0.4230486
## sample estimates:
## cor
## -0.4285572
It shows rating E and HR has smaller IQR compared to the rest of the rating, rating B and C has no outliner, rating A,B has a smaller outliner, rating AA has two outline only. With a correlation of -0.43, it shows a negative moderately strong linear relationship, as the risk rating increase, the lower amount would be borrowed from prosper.
The distribution has a bell shape and is right-skewed, we can see that investors prefer to loan to those borrowers holding lower risk rating. This is understandable since loan to those person with higher risk may be higher loss the money, even though the lenders would have higher yield by lending to higher risk case (see the next plot).
##
## Pearson's product-moment correlation
##
## data: as.numeric(ProsperRating..Alpha.) and LenderYield
## t = 917.52, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9524993 0.9537315
## sample estimates:
## cor
## 0.9531194
We see that it has a strong positive linear relationship between risk rating and lenderYield, that means that if the lenders want to have higher yield, they would take on more risk of lose rate.
The LeanderYield quickly increase in the initial 1-year, then in a stable line after 2006. But the thing that surprised me is there was no LenderYield from Q3 2008 to Q2 2009. Then I searched the following from wikipedia.
On November 24, 2008, the SEC found Prosper to be in violation of the Securities Act of 1933. As a result of these findings, the SEC imposed a cease and desist order on Prosper. Due primarily to the novel nature of the peer-to-peer lending models, the SEC, after review, now treats all peer-to-peer lending transactions as sales of securities and requires that all platforms register with the SEC.
In July 2009, Prosper reopened their website for lending (“investing”) and borrowing after having obtained SEC registration for its loans. Hence, that can explain why there were a continuous gap period on lender yield.
There is a continuous gap period Prosper was not allowed to run their business until it had obtained SEC registration for its loans. The lender yield is stale in the years.
We observed strong positive/negative linear correlation relationship between prosper risk rating and borrower rate, loan amount can request and lenderYield.
Borrowers holds lower risk rating (AA highest -> HR lowest) in prosper would 1. enjoy lower interest rates. 2. borrow more limit amount. 3. more popular with investors.
Lenders would have higher yield, if they are willing to take on higher risk of lose rate.
In the previous section we explored there is a strong positive linear correlation relationship between prosper rating and interest rate. here, by adding one more variable Term, we see that for each rating, AA to D, looks the longer loan Term would cause the higher interest rate.
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and BorrowerRate
## t = -117.58, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3341283 -0.3237719
## sample estimates:
## cor
## -0.3289599
This scatter plot shows the relationship of Loan amount, borrow rate and prosper rating. Rating AA,A and B looks can loan amount up to $35000, rating C less than $25000, rating D less than $15000, rating E less than $10000, and the rating highest risk rating HR less than $5000, seeing the plot on y-axis, the interest rate is increasing from risk rating AA to HR.
We can see investors are more likely to invest in low risk request, not the amount of the loan, the investors on AA rating looks obviously higher than others.
## substr(prosper$LoanOriginationQuarter, 1, 4): Q1 2
## [1] 35000
## --------------------------------------------------------
## substr(prosper$LoanOriginationQuarter, 1, 4): Q2 2
## [1] 35000
## --------------------------------------------------------
## substr(prosper$LoanOriginationQuarter, 1, 4): Q3 2
## [1] 35000
## --------------------------------------------------------
## substr(prosper$LoanOriginationQuarter, 1, 4): Q4 2
## [1] 35000
## prosper$ProsperRating..Alpha.: AA
## [1] 35000
## --------------------------------------------------------
## prosper$ProsperRating..Alpha.: A
## [1] 35000
## --------------------------------------------------------
## prosper$ProsperRating..Alpha.: B
## [1] 35000
## --------------------------------------------------------
## prosper$ProsperRating..Alpha.: C
## [1] 25000
## --------------------------------------------------------
## prosper$ProsperRating..Alpha.: D
## [1] 15000
## --------------------------------------------------------
## prosper$ProsperRating..Alpha.: E
## [1] 15900
## --------------------------------------------------------
## prosper$ProsperRating..Alpha.: HR
## [1] 16800
This plot shows that the maximum loan amount in prosper website is $15000 in initial year 2015, $25000 before Q2 2013 and $35000 after that. There should be no business run from Q3 2008 to Q2 2009 on prosper, after colored by risk rating, we also found there is no risk rating provided to borrowers, all of them are NA. The risk rating is implemented after Prosper reopened from Q2 2009. The rating D, E, HR can borrow maximum amount around $15000, rating C is up to $25000, all rating AA, A, B are up to maximum loan amount $35000.
We can see plot are left skewed. it shows that rating E,HR borrowed less amount, but they have more delinquencies. while the credit rating increasing, the fewer number of delinquencies come out, even though the borrowers labeled lower risk borrowed more money.
As the loan term and risk rating increase, the interest rate will increase also, there is a strong positive linear correlation relationship between prosper rating and interest rate. if we see the loan amount and risk rating, we found the higher risk rating lead borrowers have to pay more interest rate, and borrowers can have less loan amount also.
Investors are also more likely to invest their money to those borrowers they have lower risk and good credit rating, because they don’t want to lost their money, but still can have yield from this.
Further study the current delinquencies of borrowers , that also explains that high-risk borrowers generally have a higher default rate.
There are almost 30000 loan listing created before Q2 2009 are not have rating provided. The risk rating from the highest AA to the lowers HR appears Binomial distribution, 57% of borrowers are labeled middle risk B,C and D. Borrowers with rating C with 22% the highest frequency.
The rating E and HR has smaller IQR compared to the rest of the rating, rating B and C has no outliner, rating A,B has a smaller outliner, rating AA has two outline only. With a correlation of -0.43, it shows a negative moderately strong linear relationship, as the risk rating increase, the lower amount would be borrowed from prosper.
There is a strong positive linear correlation relationship 0.95 between prosper rating and interest rate. The fixed (3 0r 5 years) loan Term which borrowers select actually causes different borrower rate on each risk rating.
The dataset is from the Prosper loans, it’s America’s first peer-to-peer lending marketplace, where people could loans money by listing it in the website. There are 113937 observations and 81 variables in the dataset.
I extract 15 features and explore loan term distribution, the borrowers’ Occupation, rating, annual incomes, employmentStatus, and what purpose they loan from Prosper. There almost 80% of the borrowers selected 3 years loan term. I am surprise that there are 1614 listing was created from 2010 to 2013 with 1 year loan term, prosper.com doesn’t provide such term for loan. Not sure if prosper opened for this term or what problems caused this data issue. While we seeing Occupation of borrowers, the top 1 is Other, it’s understandable because all not in the website selecting list can be other, the second to third are professional and computer programer. The prosper rating are not provided before Q2 2009, Ratings, from lowest-risk to highest-risk, are labeled AA, A, B, C, D, E, and HR (“High Risk”), the distributions of rating follow an almost unimodal distribution, with the peak of 18345. The borrowers incomes range distributions also looks follow an almost unimodal distribution, we noticed that there are 621 loan list with zero annual income, then I further explored their employment status when they creating list, most of them are in employed status, hence, those zero incomes should be data issue. Finally, I explore the purpose of borrowers loan from prosper, found the highest category is 1 - Debt Consolidation, where’s one take out a loan to pay many others. we may guess it could be that many Prosper visitors comes with already have loans, and want to search some loans to pay for it.
Prosper rating is a very important feature, looking at the statistics and plot, Prosper Rating is strongly correlated with borrower’s interest rate, the number of investors, and the LenderYield. It can be explained that this lower risk rating the borrowers labeled, the lower borrower’s interest rate would have, and will attract more investors also. For lenders, if they would like to fund for those borrowers labeled higher risk rating, the higher yield will have, but on other hand, it will increase the money lost rate.
As I only select 15 features for this dataset, there must be additional features also important to expplore the prosper loan data, we can involve more features regarding to borrowers, lenders or prosper.org to explore more interesting results.
Udacity: https://classroom.udacity.com/nanodegrees/nd002-cn-advanced/parts/7f46cd58-8041-4d9d-88a5-4b7c6f7be63e
wikipedia: https://en.wikipedia.org/wiki/Prosper_Marketplace
ggplot: http://ggplot2.org/
Variable Definitions: https://docs.google.com/document/d/1qEcwltBMlRYZT-l699-71TzInWfk4W9q5rTCSvDVMpc/pub?embedded=true